125 research outputs found

    Trajectory-Based Off-Policy Deep Reinforcement Learning

    Full text link
    Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.Comment: Includes appendix. Accepted for ICML 201

    Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

    Full text link
    PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems.Comment: Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA

    Policy search for imitation learning

    Get PDF
    Efficient motion planning and possibilities for non-experts to teach new motion primitives are key components for a new generation of robotic systems. In order to be applicable beyond the well-defined context of laboratories and the fixed settings of industrial factories, those machines have to be easily programmable, adapt to dynamic environments and learn and acquire new skills autonomously. Reinforcement learning in principle solves those learning issues but suffers from the curse of dimensionality. When dealing with complex environments and highly agile hardware platforms like humanoid robots in large or possibly continuous state and action spaces, the reinforcement framework becomes computationally infeasible. In recent publications, parametrized policies have been employed to face this problem. One of them, Policy Improvement with Path Integrals (PI^2), has been derived from the transformation of the Hamilton-Jacobi-Bellman (HJB) equation of stochastic optimal control into a path integral using the Feynmann Kac theorem. Applications of PI^2 are so far limited to Dynamic Movement Primitives (DMP) to parametrize the motion policy. Another policy parametrization, the formulation of motion primitives as solution of an optimization-based planner has been widely used in other fields (e.g. inverse optimal control) and offers compelling possibilities to formulate characteristic parts of a motion in an abstract sense without specifying too much problem-specific geometry. Imitation learning or learning from demonstration can be seen as a way to bootstrap the acquisition of new behavior and as an efficient way to guide the policy search into a desired direction. Nevertheless, due to imperfect demonstrations, which might be incomplete or contradictory and also due to noise, the learned behavior might be insufficient. As observed in the animal kingdom, a final trial-and-error phase guided by the cost and reward of a specific behavior is necessary to obtain a successful behavior. Interestingly, the reinforcement learning framework might offer the tools to govern both learning methods at the same time. Imitation learning can be reformulated as reinforcement learning under a specific reward function, allowing the combination of both learning methods. In this work, the concept of probability-weighted averaging of policy roll-outs as seen in PI^2 is combined with an optimization-based policy representation. The reinforcement learning toolbox and direct policy search is utilized in a way that allows both imitation learning based on arbitrary demonstration types and the imposition of additional objectives on the learned behavior. A black box evolutionary algorithm, Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES), which can be shown to be closely related to the approach in PI2 is leveraged to explore the parameter space. This work will experimentally evaluate the suitability of this algorithm for learning motion behavior on a humanoid upper body robotic system. We will focus on learning from different types of demonstrations. The formulation of the reward function for reinforcement learning will be depicted and multiple test scenarios in 2D and 3D will be presented. Finally, the capability of this approach to learn and improve motion primitives is demonstrated on a real robotic system within an obstacle test scenario

    Probabilistic Recurrent State-Space Models

    Full text link
    State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g. LSTMs) proved extremely successful in modeling complex time series data. Fully probabilistic SSMs, however, are often found hard to train, even for smaller problems. To overcome this limitation, we propose a novel model formulation and a scalable training algorithm based on doubly stochastic variational inference and Gaussian processes. In contrast to existing work, the proposed variational approximation allows one to fully capture the latent state temporal correlations. These correlations are the key to robust training. The effectiveness of the proposed PR-SSM is evaluated on a set of real-world benchmark datasets in comparison to state-of-the-art probabilistic model learning methods. Scalability and robustness are demonstrated on a high dimensional problem

    Разработка информационной модели данных системы поощрений сотрудников и студентов

    Get PDF
    В статье рассмотрена система поощрений сотрудников и студентов. Составлена диаграмма сущность-связь процесса учета всех этапов документооборота данного процесса. Представлен пример формы разработанной информационной системы учета и анализа распределения поощрений сотрудниками студентам.The article considers the system of incentives for employees and students. A diagram is drawn of the essence-relationship of the process of accounting for all stages of the workflow of this process. An example of the form of the developed information system of the account and the analysis of distribution of encouragements by employees to students is presented

    A Temporary Pause in the Replication Licensing Restriction Leads to Rereplication during Early Human Cell Differentiation

    Get PDF
    Gene amplifications in amphibians and flies are known to occur during development and have been well characterized, unlike in mammalian cells, where they are predominantly investigated as an attribute of tumors. Recently, we first described gene amplifications in human and mouse neural stem cells, myoblasts, and mesenchymal stem cells during differentiation. The mechanism leading to gene amplifications in amphibians and flies depends on endocycles and multiple origin-firings. So far, there is no knowledge about a comparable mechanism in normal human cells. Here, we describe rereplication during the early myotube differentiation of human skeletal myoblast cells, using fiber combing and pulse-treatment with EdU (50 -Ethynyl-20 -deoxyuridine)/CldU (5-Chlor-20 - deoxyuridine) and IdU (5-Iodo-20 -deoxyuridine)/CldU. We found rereplication during a restricted time window between 2 h and 8 h after differentiation induction. Rereplication was detected in cells simultaneously with the amplification of the MDM2 gene. Our findings support rereplication as a mechanism enabling gene amplification in normal human cells

    Design and implementation of a platform for hyperconnected cyber physical systems

    Get PDF
    International audienceThe Internet of Things (IoT) is an area of growing importance as more and more computing capability becomes embedded into real world objects and environments. But at the same time IoT is just one component of a widespread shift towards a new age of federation, combining with other trends such as cloud computing, blockchain and automation to create a new hyperconnected infrastructure. This infrastructure will emerge from the convergence of traditional, cloud and IoT-based models of computing, creating a more decentralised, secure and democratic computing platform for the future. But while bringing significant benefits, federation also brings significant problems-in particular the complexity of building, integrating and managing systems built using highly distributed and heterogeneous platforms. In this paper we discuss our work on modelling, deployment and management for this new converged computing environment, leveraging previous work on domain languages, cloud computing and the Web of Things to accelerate and democratize the development of real world hyperconnected systems

    Persisting right-sided chylothorax in a patient with chronic lymphocytic leukemia: a case report

    Get PDF
    Introduction Chylothorax caused by chronic lymphocytic leukemia is very rare and the best therapeutic approach, especially the role of modern immunochemotherapy, is not yet defined. Case presentation We present the case of a 65-year-old male Caucasian patient with right-sided chylothorax caused by a concomitantly diagnosed chronic lymphocytic leukemia. As first-line treatment four cycles of an immunochemotherapy, consisting of fludarabine, cyclophosphamide and rituximab were administered. In addition, our patient received total parenteral nutrition for the first two weeks of treatment. Despite the very good clinical response of the lymphoma to treatment, the chylothorax persisted and percutaneous radiotherapy of the thoracic duct was applied. However, eight weeks after the radiotherapy the chylothorax still persisted and our patient agreed to a surgical intervention. A ligation of the thoracic duct via a muscle sparing thoracotomy was performed, resulting in a complete cessation of the pleural effusion. Apart from the first two weeks our patient was treated on an out-patient basis for nearly six months. Conclusion In this case of chylothorax caused by chronic lymphocytic leukemia, immunochemotherapy in combination with conservative treatment, and even consecutive radiotherapy, were not able to stop pleural effusion, despite the very good clinical response of the chronic lymphocytic leukemia to treatment. Out-patient management using repetitive thoracocenteses can be safe as bridging until definitive surgical ligation of the thoracic duct

    Reprogramming Low-end IoT Devices from the Cloud

    Get PDF
    International audienceThe Internet of Things (IoT) consists in a variety of smart connected objects, among which a category of low-end devices based on micro-controllers. The orchestration of low-end IoT devices is not straightforward because of the lack of generic and holistic solutions articulating cloud-based tools on one hand, and low-end IoT device software on the other hand. In this paper, we describe such a solution, combining a cloud-based IDE, graphical programming, and automatic JavaScript generation. Scripts are pushed over the Internet and over-the-air for the last hop, updating runtime containers hosted on heterogeneous low-end IoT devices running RIOT. We demonstrate a prototype working on common off-the-shelf low-end IoT hardware with as little as 32kB of memory

    Rich Magnetic Phase Diagram of Putative Helimagnet Sr3_3Fe2_2O7_7

    Full text link
    The cubic perovskite SrFeO3_3 was recently reported to host hedgehog- and skyrmion-lattice phases in a highly symmetric crystal structure which does not support the Dzyaloshinskii-Moriya interactions commonly invoked to explain such magnetic order. Hints of a complex magnetic phase diagram have also recently been found in powder samples of the single-layer Ruddlesden-Popper analog Sr2_2FeO4_4, so a reinvestigation of the bilayer material Sr3_3Fe2_2O7_7, believed to be a simple helimagnet, is called for. Our magnetization and dilatometry studies reveal a rich magnetic phase diagram with at least 6 distinct magnetically ordered phases and strong similarities to that of SrFeO3_3. In particular, at least one phase is apparently multiple-q\mathbf{q}, and the q\mathbf{q}s are not observed to vary among the phases. Since Sr3_3Fe2_2O7_7 has only two possible orientations for its propagation vector, some of the phases are likely exotic multiple-q\mathbf{q} order, and it is possible to fully detwin all phases and more readily access their exotic physics.Comment: 14 pages, 13 figure
    corecore